80 research outputs found
Online Mutual Foreground Segmentation for Multispectral Stereo Videos
The segmentation of video sequences into foreground and background regions is
a low-level process commonly used in video content analysis and smart
surveillance applications. Using a multispectral camera setup can improve this
process by providing more diverse data to help identify objects despite adverse
imaging conditions. The registration of several data sources is however not
trivial if the appearance of objects produced by each sensor differs
substantially. This problem is further complicated when parallax effects cannot
be ignored when using close-range stereo pairs. In this work, we present a new
method to simultaneously tackle multispectral segmentation and stereo
registration. Using an iterative procedure, we estimate the labeling result for
one problem using the provisional result of the other. Our approach is based on
the alternating minimization of two energy functions that are linked through
the use of dynamic priors. We rely on the integration of shape and appearance
cues to find proper multispectral correspondences, and to properly segment
objects in low contrast regions. We also formulate our model as a frame
processing pipeline using higher order terms to improve the temporal coherence
of our results. Our method is evaluated under different configurations on
multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018
-Multivariational Autoencoder for Entangled Representation Learning in Video Frames
It is crucial to choose actions from an appropriate distribution while
learning a sequential decision-making process in which a set of actions is
expected given the states and previous reward. Yet, if there are more than two
latent variables and every two variables have a covariance value, learning a
known prior from data becomes challenging. Because when the data are big and
diverse, many posterior estimate methods experience posterior collapse. In this
paper, we propose the -Multivariational Autoencoder (MVAE) to
learn a Multivariate Gaussian prior from video frames for use as part of a
single object-tracking in form of a decision-making process. We present a novel
formulation for object motion in videos with a set of dependent parameters to
address a single object-tracking task. The true values of the motion parameters
are obtained through data analysis on the training set. The parameters
population is then assumed to have a Multivariate Gaussian distribution. The
MVAE is developed to learn this entangled prior
directly from frame patches where the output is the object masks of the frame
patches. We devise a bottleneck to estimate the posterior's parameters, i.e.
. Via a new reparameterization trick, we learn the likelihood
as the object mask of the input. Furthermore, we alter the
neural network of MVAE with the U-Net architecture and name the new
network Multivariational U-Net (MVUnet). Our networks are trained
from scratch via over 85k video frames for 24 (MVUnet) and 78
(MVAE) million steps. We show that MVUnet enhances both posterior
estimation and segmentation functioning over the test set. Our code and the
trained networks are publicly released
Future Video Prediction from a Single Frame for Video Anomaly Detection
Video anomaly detection (VAD) is an important but challenging task in
computer vision. The main challenge rises due to the rarity of training samples
to model all anomaly cases. Hence, semi-supervised anomaly detection methods
have gotten more attention, since they focus on modeling normals and they
detect anomalies by measuring the deviations from normal patterns. Despite
impressive advances of these methods in modeling normal motion and appearance,
long-term motion modeling has not been effectively explored so far. Inspired by
the abilities of the future frame prediction proxy-task, we introduce the task
of future video prediction from a single frame, as a novel proxy-task for video
anomaly detection. This proxy-task alleviates the challenges of previous
methods in learning longer motion patterns. Moreover, we replace the initial
and future raw frames with their corresponding semantic segmentation map, which
not only makes the method aware of object class but also makes the prediction
task less complex for the model. Extensive experiments on the benchmark
datasets (ShanghaiTech, UCSD-Ped1, and UCSD-Ped2) show the effectiveness of the
method and the superiority of its performance compared to SOTA prediction-based
VAD methods
Reproducible Evaluation of Pan-Tilt-Zoom Tracking
Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in
computer vision for many years. However, it is very difficult to assess the
progress that has been made on this topic because there is no standard
evaluation methodology. The difficulty in evaluating PTZ tracking algorithms
arises from their dynamic nature. In contrast to other forms of tracking, PTZ
tracking involves both locating the target in the image and controlling the
motors of the camera to aim it so that the target stays in its field of view.
This type of tracking can only be performed online. In this paper, we propose a
new evaluation framework based on a virtual PTZ camera. With this framework,
tracking scenarios do not change for each experiment and we are able to
replicate online PTZ camera control and behavior including camera positioning
delays, tracker processing delays, and numerical zoom. We tested our evaluation
framework with the Camshift tracker to show its viability and to establish
baseline results.Comment: This is an extended version of the 2015 ICIP paper "Reproducible
Evaluation of Pan-Tilt-Zoom Tracking
The Long-Baseline Neutrino Experiment: Exploring Fundamental Symmetries of the Universe
The preponderance of matter over antimatter in the early Universe, the
dynamics of the supernova bursts that produced the heavy elements necessary for
life and whether protons eventually decay --- these mysteries at the forefront
of particle physics and astrophysics are key to understanding the early
evolution of our Universe, its current state and its eventual fate. The
Long-Baseline Neutrino Experiment (LBNE) represents an extensively developed
plan for a world-class experiment dedicated to addressing these questions. LBNE
is conceived around three central components: (1) a new, high-intensity
neutrino source generated from a megawatt-class proton accelerator at Fermi
National Accelerator Laboratory, (2) a near neutrino detector just downstream
of the source, and (3) a massive liquid argon time-projection chamber deployed
as a far detector deep underground at the Sanford Underground Research
Facility. This facility, located at the site of the former Homestake Mine in
Lead, South Dakota, is approximately 1,300 km from the neutrino source at
Fermilab -- a distance (baseline) that delivers optimal sensitivity to neutrino
charge-parity symmetry violation and mass ordering effects. This ambitious yet
cost-effective design incorporates scalability and flexibility and can
accommodate a variety of upgrades and contributions. With its exceptional
combination of experimental configuration, technical capabilities, and
potential for transformative discoveries, LBNE promises to be a vital facility
for the field of particle physics worldwide, providing physicists from around
the globe with opportunities to collaborate in a twenty to thirty year program
of exciting science. In this document we provide a comprehensive overview of
LBNE's scientific objectives, its place in the landscape of neutrino physics
worldwide, the technologies it will incorporate and the capabilities it will
possess.Comment: Major update of previous version. This is the reference document for
LBNE science program and current status. Chapters 1, 3, and 9 provide a
comprehensive overview of LBNE's scientific objectives, its place in the
landscape of neutrino physics worldwide, the technologies it will incorporate
and the capabilities it will possess. 288 pages, 116 figure
Multi-Task Learning based Video Anomaly Detection with Attention
Multi-task learning based video anomaly detection methods combine multiple
proxy tasks in different branches to detect video anomalies in different
situations. Most existing methods either do not combine complementary tasks to
effectively cover all motion patterns, or the class of the objects is not
explicitly considered. To address the aforementioned shortcomings, we propose a
novel multi-task learning based method that combines complementary proxy tasks
to better consider the motion and appearance features. We combine the semantic
segmentation and future frame prediction tasks in a single branch to learn the
object class and consistent motion patterns, and to detect respective anomalies
simultaneously. In the second branch, we added several attention mechanisms to
detect motion anomalies with attention to object parts, the direction of
motion, and the distance of the objects from the camera. Our qualitative
results show that the proposed method considers the object class effectively
and learns motion with attention to the aforementioned important factors which
results in a precise motion modeling and a better motion anomaly detection.
Additionally, quantitative results show the superiority of our method compared
with state-of-the-art methods
Extraction of volumetric structures in an illuminance image
An original method is proposed to extract the most significant volumetric structures in an illuminance im age. The method proceeds in three levels of organization managed by generic grouping principles: (i) from the illuminance im age to amore compact representation of its contents by generic structural in formation extraction leading to a basic contour primitive map; (ii) grouping of the basic primitives in order to form intermediate primitives, the contour junctions; (iii) grouping of these junctions in order to build the high-level contour primitives, the generic volumetric structures. Experimental results for various images of clutte red scenes show an ability to properly extract the structures of volumetric objects or parts with planar and curved surfaces
Generic Multi-scale Segmentation and Curve Approximation Method
We propose a new complete method to extract si gni ficant descri tii (s) of planar curves accordi g to constant curvature segments. Th i methodi s based (i) on a multi -scale segmentatiR and curve approxiM = iM algori hm, defined by two groupi ng processes (polygonal and constant curvature approxi mati ons), leadi ng to a multi -scale coveri ng of the curve, and (ii) on an i traandi nter-scaleclassi cat it of th i mult i scale coveri g gui ded by heurik ik]R-defined quali at i e labels leadi g to paiM (scale,lia of constant curvature segments) that best descri e the shape of the curve. Experi ments show that the proposed methodi s able to prov i esali] t segmentatik and approxiR] iR results whi ch respect shape descri tir and recogn iogn crin1RMM 1 Int roduct In ordert easily manipulat a planar curve or da t bases composed of planar curves,it would be int1est#/ t represent dat a accordingt primit/ es which describet hem in a waytG t resp ecttt: act: l shape for recognit/ n and co..
- …